home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
PC World Interactive 7
/
PC World Interactive 7.iso
/
program
/
cprog.EXE
/
CC_1.ZIP
/
SMALLCV2.DOC
< prev
next >
Wrap
Text File
|
1980-01-10
|
23KB
|
464 lines
.* My thanks to Dick Dievendorff for fixing up this file to format with
.* GML. Now, if someone could do the same with SCLIB DOC . . .
:gdoc.
:frontm.
:titlep.
:title.Small-c Version 2.0
:title.for the IBM Personal Computer
:title.Release 1.01
:date.
:author.Daniel R. Hicks
:address.
:aline.
:aline.RCH38DB(HICKS)
:eaddress.
:etitlep.
:preface.
:p.Portions of Small-c code Copyright 1982 by J. E. Hendrix.
:p.Converted to the IBM Personal Computer by D. R. Hicks.
:p.This document was developed entirely by D. R. Hicks, and is
:hp1.not:ehp1. copyrighted.
:body.
:h1.Why Small-c?
:p.
Why bother with a limited version of the C programming language when
complete versions of the language are available? The main reason is
price. The cheapest "full" C implementations run about $75, and
prices increase from there up to the $500 range. Small-c is free, and
this means that many users who would not want to pay for a full
implementation will have access to this one. Other users who may not
be sure that they want pay for a full C implementation can use this
one to determine if they like the language.
:p.
Another reason for the existence of Small-c is control: You get all
of the source for Small-c, and you can, if you wish, maintain or
modify it yourself. You can also change the I/O support package to
suit your needs, and you can even port the compiler to a different
processor if you wish (this version is converted from an 8080
version).
:h1.What can Small-c do?
:p.
Small-c takes a source file, consisting of C-subset statements, and
creates an assembler source file. This assembler source file can be
processed by either the IBM Small Assembler or the IBM Macro Assembler
to produce an IBM/Intel format OBJ file. The OBJ file can, in turn,
be combined with other OBJ files and LIB files via the DOS LINK
function to produce an executable EXE module. An I/O & utility
function library is provided for this purpose, implementing an
environment which quite closely mimics the UNIX environment.
:p.
This version of Small-c can be executed on an IBM PC with 128K of
storage, one single-sided diskette drive, and running IBM DOS 1.1 or
2.0. In all likelihood, the compiler will execute on a 96K system,
but this has never been tried. Since at least 96K is needed to run
either of the assemblers, there is little merit in executing in less
than 96K. Although dual diskette drives are necessary for large
compilations (where the assembler source can occupy most of one
diskette), they are useful but not essential otherwise. Where a hard
file is available, this may be used in place of diskettes. The
compiler runs equally well on DOS 1.1 and DOS 2.0, although it does
not utilize the increased function of DOS 2.0. Release 1.01 has added
up to 2 dimensional arrays of all supported data types (D. Lang 12/90)
:p.
Programs created with the compiler (and suitably assembled and linked)
can be executed on any IBM DOS system with sufficient storage.
Normally, the programs must be linked with the associated Small-c
library which, in addition to providing I/O support, sets up the
operating environment. However, a reasonably adept programmer should
be able to link Small-c procedures with procedures from other
compilers, if necessary coding assembler "glue" via the the Small-c
"#asm" statement.
:p.
Although it has never (to my knowledge) been attempted, there is no
known reason why this version of Small-c (and the programs it
compiles) should not execute equally well on any suitably configured
IBM PC-compatible MS DOS system.
:h1.What WON'T it do?
:p.
Small-c was originated by Ron Cain, and was originally published in
Dr. Dobb's Journal, number 45. The original intent was to provide
users of small microprocessor systems with a compact yet powerful
systems programming language, one which could be both used and
maintained on a small system. For this reason, many of the features
of full C implementations were omitted, resulting in a restricted but
usable language. Small-c Version 2.0 was developed by J. E. Hendrix
and published in Dr. Dobb's Journal, numbers 74 & 75. This version,
slightly larger to account for the tendency toward larger systems, has
fewer restrictions and is considerably more powerful. Nonetheless, it
has significant restrictions relative to full C implementations. The
following are the principle restrictions:
:ol.
:li.Structures and unions are not supported.
:li.Up to 2 dimensions arrays supported in Release 1.01
:li.Floating point is not supported.
:li."Long" integers are not supported.
:li.Only functions returning integers are supported.
:li.Pointers to pointers, arrays of pointers, and several of the other
exotic forms of declarations are not supported.
:eol.
:h1.Running Small-c
:p.
The compiler may be invoked with the parameters specified on the
command line, with prompting for the parameters, or a combination of
the two. To invoke the compiler with prompting, type:
:sl.
:li.CC
:esl.
:pc.The compiler will display the prompt:
:sl.
:li.Input file [CON.C]:
:esl.
:pc.Type in the name of the file containing your C program. A
filename extension of "C" is assumed. If you enter no filename, input
will be accepted from the keyboard.
:p.The compiler will then display the prompt:
:sl.
:li.Output file [CON.ASM]:
:esl.
:p.
Type in the file name of the file to contain the assembler source. A
filename identical to the source file name and an extension of "ASM"
is assumed.
:p.
The compiler will then display the prompt:
:sl.
:li.Listing file [NUL.LST]:
:esl.
:p.
Normally, the listing file is either suppressed (by routing to the NUL
device) or is routed to the printer (by specifying "prn", for
example). If no filename is specified, NUL is assumed.
:pc.
The compiler then asks the following yes/no questions:
:sl.
:li.Interleave C source?
:li.Monitor function headers?
:li.Sound alarm on errors?
:li.Pause on errors?
:esl.
:p.
"Interleave C source" means to place the C source statements into the
assembler source as comments. This is very useful if the assembler
source is to be examined or modified after compilation, but it
increases the size of the assembler source file (by slightly more than
the size of the C source file).
:p.
"Monitor function headers" means to display each function header (and
include statement) as it is processed. This is useful both for
monitoring the progress of the compilation and for interpreting the
context of error messages from the compiler. This question is not
asked (and function headers and include statements are not displayed)
if output is being routed to the display screen.
:p.
"Sound alarm on errors" means to sound the PC's "bell" (beeper)
whenever an error message is displayed (also, when the compilation
ends). This is useful for long compilations where one might wish to
leave the computer unattended but be alerted when attention was
needed.
:p.
"Pause on errors" means to stop after displaying each error message
and wait for an "ENTER" before proceeding. This prevents the error
message from being scrolled off the screen before it can be examined.
:p.
After these questions have been answered, the compilation will begin.
C source statements will be read from the input file and written to
the output file. Function headers and include statements will be
displayed if selected. When the compilation is finished, the
following message is displayed (where n is a number):
:sl.
:li.There were n errors in this compilation
:esl.
:p.
If the number of errors is zero (or if you are willing to accept
whatever the errors are that have occurred), you may process the
produced assembler source with ASM (IBM Small Assembler) or MASM (IBM
Macro Assembler). Once the program has been successfully compiled and
assembled, it should be linked, using the DOS LINK program, with the
CC.LIB library supplied, thereby producing the executable EXE file.
:p.
When parameters are specified via the DOS command line, the file names
must be entered in order -- source file, assembler file, listing file
-- followed by or interspersed with the processing options. Roughly,
the syntax is:
:sl.
:li.CC [<source_file> [<assembler_file> [<listing_file>
]]][-[n]<option> . . .][;]
:esl.
:p.
Each file name, option, or the ";", must be separated by spaces from
adjacent entries. The options are identified by a leading "-" and
contain an optional "n" which indicates the "not" of the option. The
options are:
:sl.
:li.i -- interleave C source
:li.m -- monitor function headers
:li.a -- sound alarm on errors
:li.p -- pause on errors
:esl.
:p.
The ";" indicates that defaults are to be taken for the remaining
options or file names (rather than prompting for them).
:h1.Data representations
:p.
Small-c recognizes seven different data types:
:sl.
:li.Integers
:li.Characters
:li.Integer arrays
:li.Character arrays
:li.Integer pointers
:li.Character pointers
:li.Integer functions
:esl.
:pc.
No other combinations are recognized.
:p.
Integers are signed binary numbers ranging between -32768 and +32767
and occupying 16 bits (two bytes) of data storage on a byte boundary.
The low-order eight bits of the number is stored in the byte with the
lower address.
:p.
Characters are signed binary numbers ranging between -128 and +127 and
occupying 8 bits (one byte) of data storage on a byte boundary.
:p.
Integer arrays can be up to a maximum of 2 dimensions of 16 bit
signed integers. Each element occupies two bytes of data storage on a
byte boundary, with the low-order eight bits of each element occupying
the byte with the lower address. The first element (the one
corresponding to an index value of zero) occupies the two-byte area
with the lowest address, the next element occupies the adjacent higher
address, etc.
:p.
Character arrays can be up to a maximum of 2 dimensions of 8 bit
signed integers. Each element occupies one byte of data storage on a
byte boundary. The first element (the one corresponding to an index
value of zero) occupies the one-byte area with the lowest address, the
next element occupies the adjacent higher address, etc.
:p.
Integer and character pointers are both 16 bit unsigned numbers,
ranging between 0 and 65535 and occupying two bytes of data storage on
a byte boundary. Like integers, the low-order eight bits of the
number is stored in the byte with the lower address. The only
distinction between the two is in the semantics of their use:
Dereferencing (via "*") an integer pointer causes a two byte quantity
to be fetched or stored, while dereferencing a character pointer
causes a one byte quantity to be fetched and stored. Similarly,
incrementing (decrementing) an integer pointer causes two to be added
to (subtracted from) the pointer, while incrementing (decrementing) a
character pointer causes one to be added to (subtracted from) the
pointer.
:p.
Functions are variable length areas of code storage beginning on a
byte boundary. Functions are presumed to consist of 8088 instructions
arranged in a sequence which is meaningful and which conforms to
Small-c linkage and usage protocols.
:p.
The C language is fairly unique among "modern" languages in that the
value of a named entity (i.e., variable or function) is the contents of
the entity only when the entity is a scalar. In other cases, the
value of the entity is its address. Thus, the value of an integer or
character array is the offset in the data segment to the first element
(the one with an index of zero) and the value of a function (when its
name is not followed by a parameter list) is the offset in the code
segment to the entry point of the function.
:p.
Since C is a loosely-typed language, much of the semantics of a named
entity is dependent upon the context of its use. Any of the above
data types may be dereferenced with a "*", for instance (although
dereferencing a character variable or a function name is guaranteed to
be meaningless -- and dangerous). Likewise, any of the above data
types may be treated as a function and called by appending a parameter
list. (In this case, character variables and array names are
meaningless -- and even more dangerous.)
:p.
Another unusual aspect of C is that character values are extended to
integer values when used in an expression, when passed as an argument,
or when referenced by a switch or return statement. The C language
standard permits this conversion to be done either via sign extension,
or by filling the high-order bits of the integer with zeros. Small-c
uses sign extension. Thus an expression such as:
:sl.
:li.c == 255
:esl.
:pc.(where c is a character variable) will never be true.
:h1.Language features and restrictions
:p.
In addition to the restrictions stated above, the following
restrictions hold:
:sl.
:li.Lower-case and upper-case symbols are synonymous.
:li.Local declarations at a block level and goto statements may not be
used in the same function.
:li.The sizeof operator is not supported.
:li.The cast operator is restricted to four cases:
:sl.
:li.(int)
:li.(char)
:li.(int *)
:li.(char *)
:esl.
:pc.No extraneous blanks are allowed within the cast operator.
:li.Initializers are only permitted on global or static declarations,
and only literal values may be used for initializers.
:esl.
:h1.Storage and linkage conventions
:p.
This version of Small-c utilizes the "small" storage model: The CS
register addresses a single code segment (of up to 64K), while the DS,
ES, and SS registers address a single data segment (also of up to
64K). The code segment comes first (lowest) in storage, followed
immediately by the data segment. Static storage and string constants
are allocated at the low end of the data segment, with a heap growing
up from the top of the statics, and a stack growing down from the top
of the data segment. The initialization code contained in C.LIB
initializes the data segment to be as large as available storage, up
to the 64K maximum. Although the function is so far unused, provision
is made for allocating I/O buffers above the data segment when extra
storage is available. In theory, the above allows a single program to
utilize up to 128K of storage, although, in practice, one will usually
use up the code segment roughly twice as fast as the data segment, so
that 96K (not counting DOS) is a more practical limit.
:p.
Subroutine linkage and automatic storage allocation utilizes the
stack. To call a subroutine, parameters are first pushed onto the
stack. In standard C fashion, scalar parameters (char or int) are
passed by value, while arrays and strings are passed by address. Char
and int values both occupy two bytes on the stack. Char values are
are sign-extended to 16 bits. Address values are also passed as
two-byte quantities: The interpretation of the address as an offset
into the code segment or and offset into the data segment depends on
the way it is used. (In fact, Small-c does not recognize pointers to
functions as a data type, but reference to a function name without
following "()" yields its offset into the code segment, and reference
to a variable followed by "()" results in a call to the location in
the code segment indicated by the variable.)
:p.
Parameters are pushed in order of
occurrence: The first parameter in
a list is the first one pushed and therefore the deepest one in the
stack. This is opposite the order of many C compilers, and it
prevents some C library functions (such as "printf") from being able
to determine how many parameters are present by examining the first or
second one. For this reason, the compiler, prior to a CALL, loads
register DL with the parameter count, thus allowing functions such as
"printf" to be implemented. This feature (the loading of DL) can be
disabled in cases where it is not needed, thereby generating more
compact code.
:p.
After the parameters have been pushed into the stack, the function is
called (as a NEAR procedure). This causes the return address to be
pushed onto the stack. The called function then saves the current BP
register value by pushing it onto the stack, loads BP from the current
SP register value, and increments SP by the size of automatic storage
needed. Thus, automatic storage can be addressed by using negative
offsets from BP, while parameters can be addressed using positive
offsets.
:p.
To return from a function, the result, if any is first loaded into the
AX register. Then, SP is loaded from BP (to pop any automatic storage
present), and the old value of BP is restored by popping it from the
stack. Finally, a NEAR return is executed which pops the saved return
address from the stack and returns to the calling function. It is the
responsibility of the calling routine to pop the parameters from the
stack (this is an area of possible incompatibility with other
languages).
:p.
Although the effect can be negated by the inappropriate use of global
or static variables, all code generated by this compiler is reentrant.
:h1.Execution environment
:p.
It is the intent of this implementation to mimic the standard UNIX C
operating environment as accurately as possible and reasonable, given
the limitations of Small-c and the difference in operating systems.
Thus, :hp1.main:ehp1. is passed a pair of parameters, as in UNIX C,
with the first parameter indicating the number of arguments, and the
second parameter pointing to an array of integers which can be cast
into character pointers to the arguments. These arguments are the
blank-separated arguments parsed from the DOS command line, except
that the first argument (which in UNIX C is the name of the program)
is always "main".
:p.
Also optionally parsed from the command line are redirections of
:hp1.stdin:ehp1. and :hp1.stdout:ehp1.. If an argument on the command
line is preceeded by the character "<", it is treated as the file name
for stdin rather than being added to the argument list. And if an
argument on the command line is preceded by the character ">", it is
treated as the file name for stdout rather than being added to the
argument list. (Caution: Under DOS 2.0, these arguments are
intercepted and the redirection is performed by DOS rather than by
C.LIB. Unfortunately, there are bugs in the DOS 2.0 redirection
support.) If not redirected, stdin, stdout, and stderr all refer to
the keyboard/display. stderr is not redirectable.
:h1.I/O system
:p.
It is the intent of this implementation to mimic the standard UNIX C
I/O interfaces as accurately as possible and reasonable, given the
limitations of Small-c and the difference in environment. Thus, both
I/O layers are implemented: The "standard I/O" layer and the UNIX
system call layer. The principle difference between the two layers is
not so much function as it is of performance: The standard I/O
interface provides an extra level of buffering which is useful to many
applications which process input or generate output a character at a
time, while the UNIX system call interface is more efficient when this
extra buffering is not needed. In addition, the UNIX system call
interface permits slightly more precise control over the I/O, while
the standard I/O interface functions are slightly easier to use when
processing character data.
:p.
As with UNIX C, access to the "standard I/O" layer generally requires
the user to include stdio.h, so that various values are defined and
various externals are declared. Also as with UNIX C, the include
errno.h contains error codes that are stored in the external variable
errno by the I/O routines.
:p.
The UNIX system call layer reserves the filenames "kbd", "scrn",
"con", and "prn" (all lower case) and treats files opened to these
names specially. The first three names are intercepted and routed to
the appropriate keyboard/ display interfaces, while the fourth is
routed to the printer. All other file names are passed to DOS to be
treated as DOS disk/diskette file names or DOS reserved file names.
:p.
One unusual aspect of this I/O system is that it provides a "bridge"
between the UNIX C file scheme which uses newline as a record
terminator, and the IBM/MS-DOS file scheme which uses Carriage
Return/Line Feed as a record terminator. This conversion is
implicitly performed for all I/O. If non-character data is to be
processed, this conversion must be disabled via :hp1.fbinary:ehp1.
or :hp1._ioctl:ehp1..
:p.
The implemented portions of the two levels of I/O interfaces are
described in a separate document. In general, the standard I/O
functions have the same names as their UNIX counterparts, while the
UNIX system call functions have underline ("_") prefixed on their
names. For more detail, users are advised to refer to
:cit.The Unix System:ecit. by S. R. Bourne, or
:cit.UNIX Programmer's Manual:ecit. by Bell Laboratories.
:h1.Debugging
:p.
This Small-c implementation contains very little in the way of
debugging aids. In many cases, this is of no significance, since the
C language has a fascinating tendency to produce programs which run
without error on the first attempt. However, since even the best
programmer will eventually produce errors if his programs are large
enough, some debugging facilities and techniques are usually
necessary. The standard IBM/MS-DOS facilities, combined with common
debugging techniques (such as debug "print" statements) and the few
facilities provided by the language, have proved adequate for
debugging thus far.
:p.
Someone who plans to use the compiler extensively should take some
time to familiarize himself with code generated by the compiler. This
will make it easier to associate code sequences from DEBUG
"unassemble" with the correct C source statements, thus reducing or
eliminating the need to obtain assembler listings.
:p.
When using LINK to produce a C program's EXE module, it is a good idea
to obtain a link MAP if debugging of the EXE module is likely to be
necessary. To do this, respond to the LINK "List File" prompt with
"PRN" or a file name AND specify the "/m" option so that the map is
really produced, rather than just obtaining a list of segment sizes.
This map can then be used when dumping global variables or setting
DEBUG breakpoints.
:egdoc.